Chronic Kidney Disease

22160 - R for Bio Data Science

Group 15

Introduction

The dataset contains 25 features related to chronic kidney disease, collected from 400 individuals in India. In addition to chronic kidney disease (CKD), there is information on co-diagnoses:

  • Hypertension

  • Diabetes

  • Anemia

  • Dedal edema

  • Coronary artery disease

Analysis goal

Can we identify any physiological markers which are related to a chronic kidney disease diagnosis? If so, which ones?

Methods

Data cleaning and augmentation was done using the Tidyverse collection of packages.

  • Cleaning: Renaming columns and fixing variable types.

  • Augmenting: Divide into age groups, split and join, estimate globular filtration rate (GFR)

We conducted a correlation analysis and random forest prediction of which biomarkers best predict a CKD diagnosis. For this, we utilized the PerformanceAnalytics and randomForest packages.

Results - Kidney disease stages

Using the equation below, we could estimate GFR and the different stages of CKD people were in. Due to lack of sex data, we estimated an average of male and female GFR values. \[ \text{eGFR}_{\text{cr}} = 142 \times \min\left(\frac{\text{Scr}}{\kappa},\, 1\right)^{\alpha}\times \max\left(\frac{\text{Scr}}{\kappa},\, 1\right)^{-1.200}\times 0.9938^{\text{Age}}\times 1.012 \;\; \text{[if female]} \]

Results - random forest prediction of correlation

Results - CKD and secondary diagnoses

Hypertension and diabetes was only present in those with a CKD diagnosis.

Results - Key predictors of CKD diagnosis

Discussion

Findings:

  • RF accurately predicted explanatory variables for CKD.
  • CKD is best predicted by albumin in urine, hemoglobin concentration, PCV, RBC count, and creatinine levels.

  • GFR estimate aligns with CKD diagnosis.

  • Hypertension and diabetes is more common in those with CKD

  • In this data, many patients had severe CKD

Discussion

Caveats and possible improvements:

  • Data did not need extensive cleaning.

  • GFR estimate done without information on sex, meaning decreased accuracy.

  • More information on the data source needed for more accurate conclusions.

    • E.g. Was the data collected in a hospital, including those without CKD

References

Data source

Information/theory sources

Packages